智能论文笔记

HOTCOLD Block: Fooling Thermal Infrared Detectors with a Novel Wearable Design

Hui Wei , Zhixiang Wang , Xuemei Jia , Yinqiang Zheng , Hao Tang , Shin'ichi Satoh , Zheng Wang

分类：计算机视觉

2022-12-12

Adversarial attacks on thermal infrared imaging expose the risk of related applications. Estimating the security of these systems is essential for safely deploying them in the real world. In many cases, realizing the attacks in the physical space requires elaborate special perturbations. These solutions are often \emph{impractical} and \emph{attention-grabbing}. To address the need for a physically practical and stealthy adversarial attack, we introduce \textsc{HotCold} Block, a novel physical attack for infrared detectors that hide persons utilizing the wearable Warming Paste and Cooling Paste. By attaching these readily available temperature-controlled materials to the body, \textsc{HotCold} Block evades human eyes efficiently. Moreover, unlike existing methods that build adversarial patches with complex texture and structure features, \textsc{HotCold} Block utilizes an SSP-oriented adversarial optimization algorithm that enables attacks with pure color blocks and explores the influence of size, shape, and position on attack performance. Extensive experimental results in both digital and physical environments demonstrate the performance of our proposed \textsc{HotCold} Block. \emph{Code is available: \textcolor{magenta}{https://github.com/weihui1308/HOTCOLDBlock}}.

translated by 谷歌翻译

Rethinking the Structure of Stochastic Gradients: Empirical and Statistical Evidence

Zeke Xie , Qian-Yuan Tang , Zheng He , Mingming Sun , Ping Li

分类：机器学习 | (统计)机器学习

2022-12-05

Stochastic gradients closely relate to both optimization and generalization of deep neural networks (DNNs). Some works attempted to explain the success of stochastic optimization for deep learning by the arguably heavy-tail properties of gradient noise, while other works presented theoretical and empirical evidence against the heavy-tail hypothesis on gradient noise. Unfortunately, formal statistical tests for analyzing the structure and heavy tails of stochastic gradients in deep learning are still under-explored. In this paper, we mainly make two contributions. First, we conduct formal statistical tests on the distribution of stochastic gradients and gradient noise across both parameters and iterations. Our statistical tests reveal that dimension-wise gradients usually exhibit power-law heavy tails, while iteration-wise gradients and stochastic gradient noise caused by minibatch training usually do not exhibit power-law heavy tails. Second, we further discover that the covariance spectra of stochastic gradients have the power-law structures in deep learning. While previous papers believed that the anisotropic structure of stochastic gradients matters to deep learning, they did not expect the gradient covariance can have such an elegant mathematical structure. Our work challenges the existing belief and provides novel insights on the structure of stochastic gradients in deep learning.

translated by 谷歌翻译

Learning-based Inverse Rendering of Complex Indoor Scenes with Differentiable Monte Carlo Raytracing

Jingsen Zhu , Fujun Luan , Yuchi Huo , Zihao Lin , Zhihua Zhong , Dianbing Xi , Jiaxiang Zheng , Rui Tang , Hujun Bao , Rui Wang

分类：计算机视觉 | 人工智能

2022-11-06

Indoor scenes typically exhibit complex, spatially-varying appearance from global illumination, making inverse rendering a challenging ill-posed problem. This work presents an end-to-end, learning-based inverse rendering framework incorporating differentiable Monte Carlo raytracing with importance sampling. The framework takes a single image as input to jointly recover the underlying geometry, spatially-varying lighting, and photorealistic materials. Specifically, we introduce a physically-based differentiable rendering layer with screen-space ray tracing, resulting in more realistic specular reflections that match the input photo. In addition, we create a large-scale, photorealistic indoor scene dataset with significantly richer details like complex furniture and dedicated decorations. Further, we design a novel out-of-view lighting network with uncertainty-aware refinement leveraging hypernetwork-based neural radiance fields to predict lighting outside the view of the input photo. Through extensive evaluations on common benchmark datasets, we demonstrate superior inverse rendering quality of our method compared to state-of-the-art baselines, enabling various applications such as complex object insertion and material editing with high fidelity. Code and data will be made available at \url{https://jingsenzhu.github.io/invrend}.

translated by 谷歌翻译

MONAI: An open-source framework for deep learning in healthcare

M. Jorge Cardoso , Wenqi Li , Richard Brown , Nic Ma , Eric Kerfoot , Yiheng Wang , Benjamin Murrey , Andriy Myronenko , Can Zhao , Dong Yang

分类：机器学习 | 人工智能 | 计算机视觉

2022-11-04

Artificial Intelligence (AI) is having a tremendous impact across most areas of science. Applications of AI in healthcare have the potential to improve our ability to detect, diagnose, prognose, and intervene on human disease. For AI models to be used clinically, they need to be made safe, reproducible and robust, and the underlying software framework must be aware of the particularities (e.g. geometry, physiology, physics) of medical data being processed. This work introduces MONAI, a freely available, community-supported, and consortium-led PyTorch-based framework for deep learning in healthcare. MONAI extends PyTorch to support medical data, with a particular focus on imaging, and provide purpose-specific AI model architectures, transformations and utilities that streamline the development and deployment of medical AI models. MONAI follows best practices for software-development, providing an easy-to-use, robust, well-documented, and well-tested software framework. MONAI preserves the simple, additive, and compositional approach of its underlying PyTorch libraries. MONAI is being used by and receiving contributions from research, clinical and industrial teams from around the world, who are pursuing applications spanning nearly every aspect of healthcare.

translated by 谷歌翻译

Physical Adversarial Attack meets Computer Vision: A Decade Survey

Hui Wei , Hao Tang , Xuemei Jia , Hanxun Yu , Zhubo Li , Zhixiang Wang , Shin'ichi Satoh , Zheng Wang

分类：计算机视觉

2022-09-30

Although Deep Neural Networks (DNNs) have achieved impressive results in computer vision, their exposed vulnerability to adversarial attacks remains a serious concern. A series of works has shown that by adding elaborate perturbations to images, DNNs could have catastrophic degradation in performance metrics. And this phenomenon does not only exist in the digital space but also in the physical space. Therefore, estimating the security of these DNNs-based systems is critical for safely deploying them in the real world, especially for security-critical applications, e.g., autonomous cars, video surveillance, and medical diagnosis. In this paper, we focus on physical adversarial attacks and provide a comprehensive survey of over 150 existing papers. We first clarify the concept of the physical adversarial attack and analyze its characteristics. Then, we define the adversarial medium, essential to perform attacks in the physical world. Next, we present the physical adversarial attack methods in task order: classification, detection, and re-identification, and introduce their performance in solving the trilemma: effectiveness, stealthiness, and robustness. In the end, we discuss the current challenges and potential future directions.

translated by 谷歌翻译

Multi-modal Segment Assemblage Network for Ad Video Editing with Importance-Coherence Reward

Yunlong Tang , Siting Xu , Teng Wang , Qin Lin , Qinglin Lu , Feng Zheng

分类：计算机视觉 | 人工智能

2022-09-25

广告视频编辑旨在将广告视频自动编辑为较短的视频，同时保留广告商传达的连贯内容和关键信息。它主要包含两个阶段：视频细分和段组合。现有方法在视频分割阶段表现良好，但遭受了对额外繁琐模型的依赖性问题，并且在细分组合阶段的性能差。为了解决这些问题，我们提出了M-SAN（多模式段组合网络），该网络可以执行高效且连贯的段组合任务。它利用从段中提取的多模式表示形式，并遵循带有注意机制的编码器ptr-decoder ptr-net框架。重要性补偿奖励是为培训M-SAN设计的。我们在广告客户收集的丰富广告方案下，在ADS-1K数据集上使用1000多个视频进行实验。为了评估这些方法，我们提出了一个统一的imp-coh@Time，该指标可以全面评估同时评估产出的重要性，相干性和持续时间。实验结果表明，我们的方法比随机选择和公制上的先前方法更好的性能。消融实验进一步验证了多模式表示和重要性互动的奖励可显着改善性能。 ADS-1K数据集可用：https：//github.com/yunlong10/ads-1k

translated by 谷歌翻译

Faith: An Efficient Framework for Transformer Verification on GPUs

Boyuan Feng , Tianqi Tang , Yuke Wang , Zhaodong Chen , Zheng Wang , Shu Yang , Yuan Xie , Yufei Ding

分类：机器学习

2022-09-23

变压器验证引起了机器学习研究和行业的越来越多的关注。它正式验证了变压器对对抗性攻击的鲁棒性，例如用同义词交换单词。但是，由于以中线为中心的计算，变压器验证的性能仍然不令人满意，这与标准神经网络有显着差异。在本文中，我们提出了信仰，这是用于GPU的变压器验证的有效框架。我们首先提出一个语义意识的计算图转换，以识别语义信息，例如变压器验证中的结合计算。我们利用此类语义信息，以在计算图级别启用有效的内核融合。其次，我们提出了一个验证专门的内核手工艺品，以有效地将变压器验证映射到现代GPU。该手工艺者利用了一组GPU硬件支持，以加速通常是内存密集型的验证专业操作。第三，我们提出了一个专家指导的自动调整，以纳入有关GPU后端的专家知识，以促进大型搜索空间探索。广泛的评估表明，Faith在最先进的框架上实现了$ 2.1 \ times $至$ 3.4 \ times $（$ 2.6 \ times $）的加速。

translated by 谷歌翻译

SDFE-LV: A Large-Scale, Multi-Source, and Unconstrained Database for Spotting Dynamic Facial Expressions in Long Videos

Xiaolin Xu , Yuan Zong , Wenming Zheng , Yang Li , Chuangao Tang , Xingxun Jiang , Haolin Jiang

分类：计算机视觉

2022-09-18

在本文中，我们提出了一个称为SDFE-LV的大规模，多源和不受约束的数据库，用于发现长视频中完整动态面部表达的发作和偏移帧，这被称为动态面部表情斑点的主题（DFE）和许多面部表达分析任务的重要步骤。具体而言，SDFE-LV由1,191个长视频组成，每个视频包含一个或多个完整的动态面部表情。此外，在相应的长视频中，每个完整的动态面部表达都被10次训练有素的注释者独立标记了五次。据我们所知，SDFE-LV是DFES任务的第一个无限制的大规模数据库，其长期视频是从多个现实世界/密切现实世界中的媒体来源收集的，例如电视采访，纪录片，电影和电影，以及我们媒体短视频。因此，在实践中，SDFE-LV数据库上的DFE任务将遇到许多困难，例如头部姿势变化，遮挡和照明。我们还通过使用许多最新的深度发现方法，从不同角度提供了全面的基准评估，因此对DFE感兴趣的研究人员可以快速而轻松地开始。最后，通过有关实验评估结果的深入讨论，我们试图指出几个有意义的方向来处理DFES任务，并希望将来DFE可以更好地进步。此外，SDFE-LV将仅尽快自由发布供学术使用。

translated by 谷歌翻译

Towards A Unified Policy Abstraction Theory and Representation Learning Approach in Markov Decision Processes

Min Zhang , Hongyao Tang , Jianye Hao , Yan Zheng

分类：机器学习 | 神经与进化计算

2022-09-16

在智能决策系统的核心上，如何代表和优化政策是一个基本问题。这个问题的根源挑战是政策空间的大规模和高复杂性，这加剧了政策学习的困难，尤其是在现实世界中。对于理想的替代政策领域，最近在低维潜在空间中的政策表示表明其在改善政策的评估和优化方面的潜力。这些研究所涉及的关键问题是，我们应根据哪些标准抽象出所需的压缩和泛化的政策空间。但是，文献中对政策抽象的理论和政策表示学习方法的研究较少。在这项工作中，我们做出了最初的努力来填补空缺。首先，我们提出了一个统一的政策抽象理论，其中包含与不同级别的政策特征相关的三种类型的策略抽象。然后，我们将它们推广到三个策略指标，以量化政策的距离（即相似性），以便在学习策略表示方面更方便使用。此外，我们建议基于深度度量学习的政策表示学习方法。对于实证研究，我们研究了拟议的政策指标和代表的功效，分别表征政策差异和传达政策概括。我们的实验均在政策优化和评估问题中进行，其中包含信任区域政策优化（TRPO），多样性引导的进化策略（DGES）和非政策评估（OPE）。自然而然地，实验结果表明，对于所有下游学习问题，都没有普遍的最佳抽象。虽然影响力 - 反应抽象可以是通常的首选选择。

translated by 谷歌翻译

M^2-3DLaneNet: Multi-Modal 3D Lane Detection

Yueru Luo , Xu Yan , Chaoda Zheng , Chao Zheng , Shuqi Mei , Tang Kun , Shuguang Cui , Zhen Li

分类：计算机视觉

2022-09-13

由于其稀疏和细长的性质，估算3D空间中准确的车道线仍然具有挑战性。在这项工作中，我们提出了M^2-3dlanenet，这是一个有效3D车道检测的多模式框架。旨在集成来自多传感器的互补信息，M^2-3dlanenet首先将多模式特征提取具有模态特异性骨架，然后将它们融合在统一的鸟眼视图（BEV）空间中。具体而言，我们的方法由两个核心组成部分组成。 1）要获得准确的2D-3D映射，我们提出了自上而下的BEV生成。其中，使用线条限制的变形（LRDA）模块可用于以自上而下的方式有效地增强图像特征，从而充分捕获车道的细长特征。之后，它使用深度感知的举重将2D锥体特征投入到3D空间中，并通过枕形生成BEV特征。 2）我们进一步提出了自下而上的BEV融合，该融合通过多尺度的级联注意力汇总了多模式特征，从而集成了来自摄像头和激光雷达传感器的互补信息。足够的实验证明了M^2-3dlanenet的有效性，该实验的有效性超过了先前的最先进方法，即在OpenLane数据集上提高了12.1％的F1-SCORE改善。

translated by 谷歌翻译